Search Results

Documents authored by Brandt, Jim


Document
Driving HPC Operations With Holistic Monitoring and Operational Data Analytics (Dagstuhl Seminar 23171)

Authors: Jim Brandt, Florina Ciorba, Ann Gentile, Michael Ott, and Torsten Wilde

Published in: Dagstuhl Reports, Volume 13, Issue 4 (2023)


Abstract
Advances in analytic approaches have brought the vision of efficient High Performance Computing (HPC) operations enabled by dynamic analysis driving automated feedback and adaptation within reach. Many HPC centers have started the development and deployment of frameworks to enable continuous and holistic monitoring, archiving, and analysis of performance data from their production machines and related infrastructures. The impact of such frameworks rests upon the ability to effectively analyze such data and to take action based on analysis results. Analytic techniques have been successfully developed and applied in other domains but their features may not apply directly to HPC operations data and situations. Response options are limited in HPC implementations. Leveraging, adapting, and extending analysis techniques and response options would open up new avenues for research and development of actionable analytics that can drive more intelligent operations through both manual and automated response to conditions of interest. This Dagstuhl Seminar 23171 brought together practitioners and researchers in the areas of HPC system management and monitoring, analytics, and computer science to collaboratively work on developing community solutions for revolutionizing HPC system operations. The topics discussed in this seminar spanned use cases, data and analytic approaches required to address the use cases, use of analysis results to improve performance and operations, and research in the development and use of autonomous feedback loops.

Cite as

Jim Brandt, Florina Ciorba, Ann Gentile, Michael Ott, and Torsten Wilde. Driving HPC Operations With Holistic Monitoring and Operational Data Analytics (Dagstuhl Seminar 23171). In Dagstuhl Reports, Volume 13, Issue 4, pp. 98-120, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@Article{brandt_et_al:DagRep.13.4.98,
  author =	{Brandt, Jim and Ciorba, Florina and Gentile, Ann and Ott, Michael and Wilde, Torsten},
  title =	{{Driving HPC Operations With Holistic Monitoring and Operational Data Analytics (Dagstuhl Seminar 23171)}},
  pages =	{98--120},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2023},
  volume =	{13},
  number =	{4},
  editor =	{Brandt, Jim and Ciorba, Florina and Gentile, Ann and Ott, Michael and Wilde, Torsten},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.13.4.98},
  URN =		{urn:nbn:de:0030-drops-192403},
  doi =		{10.4230/DagRep.13.4.98},
  annote =	{Keywords: Monitoring, Operational Data Analytics, Dagstuhl Seminar, WAFVR}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail